Unfork llama.cpp: use vanilla llama-server directly #77

Copilot · 2025-12-03T19:15:47Z

The repo was maintaining a patched fork of llama.cpp's server, requiring manual patch maintenance on each llama.cpp bump. Vanilla llama-server already supports /v1/models during model loading.

Changes

CMakeLists.txt: Enable LLAMA_BUILD_SERVER and LLAMA_BUILD_TOOLS to build upstream llama-server, then copy to com.docker.llama-server for binary name compatibility
Removed llamacpp/native/src/server/: Deleted forked server.cpp, utils.hpp, httplib.h, server.patch, and build files
Updated README.md: Simplified llama.cpp version bump instructions (no more patching)

# Build vanilla llama-server from submodule
set(LLAMA_BUILD_COMMON ON CACHE BOOL "" FORCE)
set(LLAMA_BUILD_TOOLS ON CACHE BOOL "" FORCE)
set(LLAMA_BUILD_SERVER ON CACHE BOOL "" FORCE)
add_subdirectory(vendor/llama.cpp)

# Rename for backwards compatibility
add_custom_target(docker-llama-server ALL
    DEPENDS llama-server
    COMMAND ${CMAKE_COMMAND} -E copy
        $<TARGET_FILE:llama-server>
        ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${SERVER_OUTPUT_NAME}
)

Original prompt

This section details on the original issue you should resolve

<issue_title>Unfork llama.cpp</issue_title>
<issue_description>We should be able to use the code, cmake files, etc. from llama.cpp as is:

docker#471</issue_description>

Comments on the Issue (you are @copilot in this section)
@ericcurtin @doringeman you seem like you are in the know here, I can do this to vanilla llama-server (which we should be using) no problem, what is difference about the fork that we cannot do similar?
$ curl -k http://127.0.0.1:8080/v1/models
{"models":[{"name":"unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M","model":"unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M","modified_at":"","size":"","digest":"","type":"model","description":"","tags":[""],"capabilities":["completion"],"parameters":"","details":{"parent_model":"","format":"gguf","family":"","families":[""],"parameter_size":"","quantization_level":""}}],"object":"list","data":[{"id":"unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M","object":"model","created":1764770422,"owned_by":"llamacpp","meta":{"vocab_type":2,"n_vocab":151936,"n_ctx_train":262144,"n_embd":2048,"n_params":30532122624,"size":18550716416}}]}

Fixes Unfork llama.cpp docker/model-runner#472

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: ericcurtin <[email protected]>

Initial plan

ac9b0f3

Copilot AI assigned Copilot and ericcurtin Dec 3, 2025

Copilot started work on behalf of ericcurtin December 3, 2025 19:16 View session

Unfork llama.cpp: Use vanilla llama-server instead of custom fork

1971bd3

Co-authored-by: ericcurtin <[email protected]>

Copilot AI changed the title ~~[WIP] Unfork llama.cpp to enable usage as is~~ Unfork llama.cpp: use vanilla llama-server directly Dec 3, 2025

Copilot AI requested a review from ericcurtin December 3, 2025 19:35

Copilot finished work on behalf of ericcurtin December 3, 2025 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unfork llama.cpp: use vanilla llama-server directly #77

Unfork llama.cpp: use vanilla llama-server directly #77

Uh oh!

Copilot AI commented Dec 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Unfork llama.cpp: use vanilla llama-server directly #77

Are you sure you want to change the base?

Unfork llama.cpp: use vanilla llama-server directly #77

Uh oh!

Conversation

Copilot AI commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 3, 2025 •

edited

Loading